NWM v2.1 Retrospective Zarr Usage Example

Subset CHRTOUT to gages and re-rechunk for better data access times

James McCreight

TLDR: re-rechunking takes data access time for a single streamflow gage from 2 mins to 80ms.

Sometimes the chunks are not optimized to match your access pattern and you need to re-chunk (re-rechunk?) to maintain your cool. As seen in usage_example_streamflow_timeseries.ipynb, getting a single gage can take about 2 minutes. If you have to do that over and over again for 8000 gages you could be waiting like 11 days to get it all done. Instead of doing that, rechunk!

This notebook take the chrtout.zarr store, uses the gage_id variable to subset out just the gages, and then rechunks the subset to optimize access to the full timeseries at each point individually. Note that this is the inverse chunk when the model natively writes the "chanobs" files, which consists of all the gages (single space chunk) in separate files by time (effectively a chunk for each time).

Plot data at a single gage

Bring in observations